Analysis on single nucleotide polymorphisms of the PeTPS-(-)Apin gene in Pinus elliottii

Background Resin-tapping forests of slash pine (Pinus elliottii) have been set up across Southern China owing to their high production and good resin quality, which has led to the rapid growth of the resin industry. In this study, we aimed to identify molecular markers associated with resin traits in pine trees, which may help develop marker-assisted selection (MAS). Methods PeTPS-(-)Apin gene was cloned by double primers (external and internal). DnaSP V4.0 software was used to evaluate genetic diversity and linkage disequilibrium. SHEsis was used for haplotype analysis. SPSS was used for ANOVA and χ2 test. DnaSP v4.0 software was used to evaluate genetic diversity. Results The full length PeTPS-(-)Apin gene was characterized and shown to have 4638 bp, coding for a 629-amino acid protein. A total of 72 single nucleotide polymorphism (SNP) loci were found. Three SNPs (CG615, AT641 and AG3859) were significantly correlated with α -pinene content, with a contribution rate > 10%. These SNPs were used to select P. elliottii with high α-pinene content, and a 118.0% realistic gain was obtained. Conclusions The PeTPS-(-)Apin gene is not uniquely decisive for selection of plus slash pines with stable production, high yield, and good quality, but it can be used as a reference for selection of other resin-producing pines and other resin components.


Introduction
Resin, synthesized in the trunk of a pine tree, is a mixture of terpenoids. It is distilled to produce liquid turpentine and solid resin. Turpentine is a volatile essential oil, mainly a mixture α-Pinene, a monoterpene compound with the highest content in turpentine mixtures, is a critical resistance molecule against insect, bacterial, and mechanical trauma in pines. It has left-and right-handed forms but is generally present in the left-handed form in resin [20]. The left handed α-pinene synthetase gene ((-)Apin) has been cloned in P. taeda [14], P. contorta [21], and P. banksiana [21]. Studies show that (-)Apin gene expression in pine trunk is correlated with α-pinene content in turpentine [22], which in turn is positively correlated with the production of turpentine and is positively or negatively correlated with other turpentine components. The identification of molecular markers related to high yield, superior quality resin would significantly promote genetic improvement and germplasm innovation. Compared with linkage analysis, correlation analysis can identify important alleles that are closely related to phenotypic traits.
To search for molecular markers related to resin traits to be used in marker-assisted selection (MAS), the PeTPS-(-)Apin gene was cloned, and its single nucleotide polymorphisms (SNPs) and linkage disequilibrium were analyzed. Functional SNPs were screened by ANOVA, 110 samples were typed, and the optimal selection scheme of P. elliottii with high αpinene content was also considered. Selection of valuable plus trees will provide excellent materials for the breeding of resin-producing-pines and also provide a theoretical basis for the sustainable development of the resin market.

Plant materials
Progeny tests were conducted at Baiyunshan forest farm (26˚51 0 N, 115˚11 0 E), Ji'an City, Jiangxi Province, China. The slash pines were cultivated on a flat woodland (hilly red soil, subtropical climate, 90 m altitude, 1646 mm rainfall, average temperature of 18.6˚C). Three progeny trials comprised 110 open-pollinated families of superior trees that were collected from three seed orchards in the United States and one in China (listed in Table 1) and then planted in the spring of 1990 at Ganzhou, Ji'an, and Jingdezhen in Jiangxi Province, Southern China. The progeny trial had a completely randomized block design with five replicates, each consisting of a four-tree plot, and the distance between each tree was 4 m (Figs 1 and 2). A single plant of each family was randomly selected in the II block group of the test forest. If all the plants were missing (those suffering from damage and pests), a single plant of this family was randomly selected in the other block group. A total of 110 samples from 110 families were obtained.

PeTPS-(-)Apin gene cloning
Prior to gene cloning, the tender buds or needles of P. elliottii were collected, labeled, and then stored at -80˚C. DNA was extracted using the TIANGEN KIT and diluted to 50 ng/μL. The working solution was kept at 4˚C, and the stock solution was kept at -80˚C. PeTPS-(-)Apin gene was obtained from genomic DNA using double outer and inner primers (sequences are listed in Table 2). PCR amplification was performed using the PrimeSTAR HS DNA Polymerase (Takara, Japan). The reaction volume in the first round of amplification (outer primes) was 10 μL, and in the second round of amplification (inner primes) 50 μL. PCR products were sequenced by the Shanghai Sango company.

SNP site analysis and typing
Sequences were analyzed using Chromas 2.3 to contrast the peaks, and Excel software to organize the SNP typing results in an A, T, C, and G format. Insertion/deletion mutations were

PLOS ONE
removed from the combined gene information file, and the results were imported into DnaSP 4.0 software, which was used to calculate basic SNP information such as number of inversions and conversions, frequency, nucleotide polymorphism (π, θw), synonymous mutation diversity, non-synonymous mutation diversity, silencing site diversity, and linkage disequilibrium. SHEsis was used for haplotype analysis. ANOVA and χ 2 were performed to evaluate if the results conformed to the Hardy-Weinberg equilibrium using SPSS software.

PeTPS-(-)Apin gene sequence
After cloning, sequencing, and assembly, the full length sequence of the PeTPS-(-)Apin gene was obtained. The genome was of 4638 bp and included 10 exons and 9 introns, and encoded a 629 amino acid protein (Fig 3).

Diversity analysis of SNPs
After removing a 46 bp insertion/deletion mutation, the 4592 bp sequence was used for further analysis. A total of 72 SNP loci were observed, with 37 transitions and 35 transversions, and 59 SNP loci were of high information content. The average and high information content SNP frequencies were 1/64.42 and 1/78.61, respectively, with 47 silent mutation sites and 12 nonidentical mutation sites. The nucleotide polymorphism of π 0.00276 and θw 0.00259, were at a low level, probably because more polymorphisms were not detected form the small sample size in this study. The diversity of silencing loci of 0.00316, diversity of synonymous mutations of 0.0009, and diversity of non-synonymous mutations of 0.00202 were all at a low level. These  results indicated that the nucleotide sequence of this gene is highly conserved. Its haplotype diversity was 0.897, its non-synonymous mutation frequency in the coding region (Ka) was 0.00202, its synonymous mutation frequency in the coding region (Ks) was 0.00317, its minimum number of historical recombination events (Rm) was 4, and its Rm/SNPs was 0.068. In addition, Ka/Ks was 0.6382, which, being less than 1, suggested that the gene was undergoing balanced selection. Furthermore, three methods (Tajiama D, Fu-Li, Fay, and Wu's H test) were used to detect neutral selection, and the results were 0.23743, 0.20077, and 0.1318, respectively. All three values were greater than 0 (positive), but the test values seldom reached a sufficiently significant level to indicate that this gene followed a neutral model in evolution and is undergoing balanced selection.

Linkage disequilibrium
The level of linkage disequilibrium varied greatly in different species and different gene sequences. A correlation analysis was established based on linkage disequilibrium (LD), and different LD levels determined different strategies for correlation analysis. An LD attenuation diagram of 59 highly informative SNP loci of the PeTPS-(-)Apin gene is shown in Fig 4. As a result, the r 2 value decreased to 0.2 at approximately 1000 bp and below 0.1 at approximately 2000 bp, indicating a high level of LD and a close linkage between these markers. To better understand the linkage relationship among SNPs, an LD matrix diagram of the gene was constructed, with darker color representing greater connection (Fig 5). Haplotype analysis was performed based on r 2 value. All the 110 samples were used for haplotyping, and the length of the haplotype blocks was 150-1400 bp. Theoretically, the number of haplotypes should be twice that of the number of SNPs contained in the block, but the actual number of observed haplotypes scarcely approached the theoretical value. This may result from the close linkage between SNP loci. Additionally, the frequency distribution of genotypes of different haplotype blocks was varied, and there were 1-3 dominant genotypes in each haplotype (frequencies greater than 0.2). An overview of the haplotype blocks is shown in Table 3.

Discussion
Three SNPs (CG615, AT641 and AG3859) were selected as TagSNPs related to α-pinene content. Among them, mutations in AG3859 may lead to increased activity of some enzymes regulating the synthesis of α-pinene. In this study, 12 plus trees were selected, and the actual gain of α-pinene content was increased by 44.39% without decreasing the contents of W 0 , W P , turpentine or β-pinene content.

Analysis of the TagSNPs
As the HapMap project progressed, a large amount of SNP site information accumulated in the human genome database, and tagging SNPs (i.e., TagSNPs) need to be screened from these data to reduce interference from redundant data for more precise location of disease-associated SNP sites [23]. The subject of this work was slash pine (P. elliottii), a less studied biological species, for which there is still very little SNP data [24]. Loblolly pine (P. taeda) is the most thoroughly studied pine species and its complete genome sequence is available, but repeat sequences hamper the expansion of the SNP database. There is no GWAS report of related species in the short term [8]. Moreover, as a complex quantitative trait, pine resin trait needs a   [25].
In this study, however, we only analyzed 110 families introduced from the United States, because slash pine is not a native species. To ensure a distant genetic relationship between individuals, it is difficult to collect more than 200 individuals as association groups. Based on a small data volume of nucleotides database (less than 300), our existing SNP data cannot cover the genome-wide. Moreover, it is difficult to obtain more SNP data in the short term due to the enormous genome (>20,000 Mbp). Therefore, simple associations (ANOVA) were used for association analysis, which may be controversial. Nevertheless, PeTPS-(-)Apin gene was considered as an important candidate gene for α -pinene content, and three TagSNPs (CG615, AT641 and AG3859) were associated with α -pinene content.

Molecular mechanism of mutations
Using candidate gene association analysis, we found three TagSNPs (CG615, AT641, and AG3859) that may be related to α-pinene content, and analyzed the molecular mechanisms of these three mutations (Fig 6). These three mutations were all non-synonymous mutations in exon 1 (haplotype block (-) APin-1) and exon 10 (haplotype block (-) APin-5): G at CG615 was mutated to C, resulting in a change from arginine, to proline; T at AT641 was mutated to A, changing phenylalanine, to tyrosine; and G at AG3859 was mutated to A, changing arginine to lysine. We used the tools ElM and CDD to predict the possible functional sites of PeTPS-(-) Apin gene: AG3859 is the first amino acid of the LIG_FHA ligand functional domain. This domain binds forkhead-associated (FHA) phosphopeptide ligands that consist of seven amino acids, the phosphopeptide identification domain of many regulatory proteins [26]. The mutation at AG3859 may lead to increased activity of some enzymes that regulate the synthesis of α-pinene. The functions of CG615 and AT641 have not yet been predicted, but we do know that the mutations are at the N-terminus of the protein, near the conserved domain of the TPS gene families, and are closely linked (amino acids are at positions 10 and 19, respectively). Whether the linkage of mutations and α-pinene content is an artifactual event caused by sample size and population structure, or a real event caused by natural selection, is unclear, and its molecular mechanism remains to be studied. A large number of studies have shown that some traits can be controlled by a single gene (or even a single site mutation), such as that related to resistance. Although most complex traits are controlled by multiple genes, there may be dominant, additive, epistatic, and interaction effects among these genes, such as those that occur in tree height, weight, and yield. Studies have shown that the synthesis of α-pinene is controlled by a specific synthetase, that is, the internal mutation of PeTPS-(-)Apin gene may lead to more active functional sites, thus increasing the content of α-pinene. However, in actual production, we found that the content of αpinene was affected by time and space, and therefore is a more complex quantitative trait. More candidate genes related to it need to be mined.

Selection of plus trees
As the demand for processed resin products has exceeded supply, the prices of certain chemical components have rapidly increased each year. Therefore, the supply of industrial raw materials can only be guaranteed by increasing the content of specific components of the resin itself. There are significant differences between species, even of the same species, although the chemical composition of resin is similar. For example, in turpentine of P. kesiya, α-pinene accounts for over 90% of monoterpene content, β-pinene content is as high as 25.9%, and there is even a high content of Δ3-carene in rare breeds [27]. In the resin of P. elliottii, pimaric type acid content is 9.93% and that of isopimaric acid up to 7.6%, whereas isopimaric acid content in P. massoniana is less than 1% [28]. We suggest that the chemical composition and content should be considered important criteria of resin quality. Prior to evaluating the quality of resin, one should consider the content of the expensive ingredients that are widely used in industry [29]. Studies have shown that single components in resin are synthesized under the control of single genes, and its content is controlled by a gene that has a dominant effect [30]. These gene variants can be targeted for improvement through artificial selection. The heritability of main components of turpentine is from 0.2 to 0.6, and the α-pinene is 0.3-0.5 [2]. In

PLOS ONE
recent years, there has been an increasing number of directional selection and breeding programs of resin components in resin-producing-pines in China. The breeding targets involve α-pinene, β-pinene, Δ3-carene, dipentene, abietic acid, isopimaric acid, pimanthrene, and

PLOS ONE
pimaric acid) [2,5]. The application of molecular markers in the directional selection of specific components has not yet been reported.
In this study, three TagSNPs (CG615, AT641 and AG3859) were used for genotyping ( Fig  7) and were used to select P. elliottii with high α-pinene content. There were only 10 genotypes were observed (27 are possible in Fig 7). Six genotypes with high α-pinene content (over 20%) were observed: AaBbCC, AABBCC, AABBCc, AABBcc, aaBbCc, and aabbCC. Among them, the α-pinene content of AABBCC was over 40%. Based on only AABBCC selection, two trees were selected (S2 Schedule , Fig 8). With a selection ratio of 1.82% and a selection differential of 24.32, we achieved a real gain of 118.0% in α-pinene content. In this way, W 0 , W P , and turpentine were not reduced, but there an 8.97% reduction in β-pinene content. Multiple studies have shown a significantly negative correlation between α-pinene and β-pinene contents, which may be related to these two synthase genes and their regulatory genes. The genetic correlation coefficients ranged from 0.36 to 0.46 [1,2,5]. We also tried another selection method. The α-pinene content of AABBCc reached 26%, and it was also considered an excellent genotype. Based on the selection of these two genotypes (AABBCC and AABBCc), 12 trees were selected (Fig 9, S2 Schedule), the selection ratio was 10.91%, and the selection difference was 9.15. This can increase the real gain of α-pinene content by 44.39% without reducing W 0 , W P , turpentine, or β-pinene content. In actual industrial production, β-pinene is also an important material, and we can choose the scheme according to different breeding objectives.

Conclusions
Above all, the yield and quality of resin are complex quantitative traits. PeTPS-(-)Apin gene variants can be used to select P. elliottii trees with high α-pinene content. Although selection of plus P. elliottii for stable production, high yield, and quality resin with high α-pinene content is not only contingent on the PeTPS-(-)Apin gene, this gene can certainly be used as a